This is some exploratory analysis looking at both the number of dimensions in our statistical performance indicators and whether our quintile based approach for grouping countries is sensible.
We apply two methods. First, we use principal components analysis (PCA) to examine how many relevant dimensions are present across our 54 SPI indicators. Second, we compare our SPI quintile groups to groups formed using K means clustering.
Takeaways:
There appears to be a single dominant dimension according to PCA that accounts for roughly 33% of the variation in our indicators. This primary principal component is very strongly related to our SPI overall score (\(R^2=0.96\)). There is also a second dimension that explains around 10% of the variation.
K means cluster analysis with between 3-5 clusters seem to roughly line up with our SPI quintile groups, although K-means clustering tends to introduce differences along the 2nd PCA dimension, which our SPI quintile groups does not.
Start by assessing dimensionality of SPI indicator data. We will use principal components analysis applied to our 54 SPI indicators using data from 2019. We begin by showing a “Scree” plot to show the number of principal components to keep in our principal components analysis. There appears to be a single dominant principal component, with a second dimension that is less important but makes up 10% of the variation.
Next we look at how well the single PCA dimension maps to our SPI Overall Score. The fit is quite strong. The R squared in a regression of the SPI overall score on the 1st principal component is around 0.96.
This is some evidence that our SPI overall score is picking up the bulk of the variation between countries, since it is highly correlated with a dominant 1st principal component.
Next, we can assess whether our SPI quintile groups serve as sensible groups between countries. In order to do this, we compare our SPI groups to those formed using K means clustering. The K means clustering algorithm finds the groups of observations (in our case countries) that minimize the within group variance in our 54 indicators.
In the analysis below, we explore both the optimal number of clusters and whether our SPI groupings based on the quintile of the SPI overall score match these K means clusters.
We examine different numbers of clusters below and visually inspect which seems to provide sensible clusters. We choose k=8,5,4, and 3.